48 research outputs found

    Dinucleotide controlled null models for comparative RNA gene prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available.</p> <p>Results</p> <p>We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.</p> <p>Conclusion</p> <p>SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.</p> <p>Availability</p> <p>SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p

    DNA word analysis based on the distribution of the distances between symmetric words

    Get PDF
    We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expecte

    Epigenetic Regulation of HIV-1 Latency by Cytosine Methylation

    Get PDF
    Human immunodeficiency virus type 1 (HIV-1) persists in a latent state within resting CD4+ T cells of infected persons treated with highly active antiretroviral therapy (HAART). This reservoir must be eliminated for the clearance of infection. Using a cDNA library screen, we have identified methyl-CpG binding domain protein 2 (MBD2) as a regulator of HIV-1 latency. Two CpG islands flank the HIV-1 transcription start site and are methylated in latently infected Jurkat cells and primary CD4+ T cells. MBD2 and histone deacetylase 2 (HDAC2) are found at one of these CpG islands during latency. Inhibition of cytosine methylation with 5-aza-2â€Čdeoxycytidine (aza-CdR) abrogates recruitment of MBD2 and HDAC2. Furthermore, aza-CdR potently synergizes with the NF-ÎșB activators prostratin or TNF-α to reactivate latent HIV-1. These observations confirm that cytosine methylation and MBD2 are epigenetic regulators of HIV-1 latency. Clearance of HIV-1 from infected persons may be enhanced by inclusion of DNA methylation inhibitors, such as aza-CdR, and NF-ÎșB activators into current antiviral therapies

    The Proteomic Code: a molecular recognition code for proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Proteomic Code is a set of rules by which information in genetic material is transferred into the physico-chemical properties of amino acids. It determines how individual amino acids interact with each other during folding and in specific protein-protein interactions. The Proteomic Code is part of the redundant Genetic Code.</p> <p>Review</p> <p>The 25-year-old history of this concept is reviewed from the first independent suggestions by Biro and Mekler, through the works of Blalock, Root-Bernstein, Siemion, Miller and others, followed by the discovery of a Common Periodic Table of Codons and Nucleic Acids in 2003 and culminating in the recent conceptualization of partial complementary coding of interacting amino acids as well as the theory of the nucleic acid-assisted protein folding.</p> <p>Methods and conclusions</p> <p>A novel cloning method for the design and production of specific, high-affinity-reacting proteins (SHARP) is presented. This method is based on the concept of proteomic codes and is suitable for large-scale, industrial production of specifically interacting peptides.</p

    The ancient history of the structure of ribonuclease P and the early origins of Archaea

    Get PDF
    corecore